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Abstract 

In this paper, we present a successful implementation of a subtraction-noise projection method 
into a simple, simulated data analysis pipeline of a gravitational-wave search. We investigate the 
problem to reveal a weak stochastic background signal which is covered by a strong foreground 
of compact-binary coalescences. The foreground which is estimated by matched filters, has to be 
subtracted from the data. Even an optimal analysis of foreground signals will leave subtraction 
noise due to estimation errors of template parameters which may corrupt the measurement of the 
background signal. The subtraction noise can be removed by a noise projection. We apply our 
analysis pipeline to the proposed future-generation space-borne Big Bang Observer (BBO) mission 
which seeks for a stochastic background of primordial GWs in the frequency range ~ 0.1 Hz — 
1 Hz covered by a foreground of black-hole and neutron-star binaries. Our analysis is based on a 
simulation code which provides a dynamical model of a time-delay interferometer (TDI) network. 
It generates the data as time series and incorporates the analysis pipeline together with the noise 
projection. Our results confirm previous ad hoc predictions which say that BBO will be sensitive 
to backgrounds with fractional energy densities below Q = 10~^^. 
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I. INTRODUCTION 



Currently, the first generation of large-scale laser interferometers is being operated to 
make a direct detection of a gravitational wave (GW) The primary targets of 

these detectors are compact-binary coalescences (CBCs), pulsars and supernovae. How- 
ever, the predicted event rate is so small that a detection of GWs by these instruments 
is highly unlikely. Their science goals are to push technological developments for a next 
generation of detectors and to place upper limits on GW amplitudes, thereby deriving at 



least to some degree restrictions on astrophysica 
deterministic sources like pulsars and binaries 



processes. These limits either refer to 
5| or to stochastic backgrounds of GWs 



which may have an astrophysical or cosmological origin j^. Although limits on stochastic 
backgrounds already become (weakly) scientifically relevant, the current upper limit of the 
background energy is about 10 orders of magnitude above a likely value for the cosmological 
background assuming standard inflationary models. At this stage, confidence in a detection 
event would be significantly increased by combining the data of many different detectors 
[3]. This technique has become a standard tool in GW data analysis, and it is the only 
method to coherently detect stochastic signals. Within ten years, the first generation of 
ground-based detectors will be joined or replaced by advanced LIGO - a second-generation 
ground-based detector - and LISA, which will be the first space-borne laser-interferometric 
GW detector In contrast to first-generation detectors and advanced LIGO, LISA 



has to cope with a totally different data analysis problem. 



JSA will be sensitive to many 



sources which combine to form a GW signal foreground [10|, llll . This foreground is formed 
by millions of galactic white-dwarf (WD) binaries and cannot be resolved completely. The 
unresolved, residual foreground acts as Gaussian confusion noise which impairs the detec- 
tion and analysis of other signals. Any future detector will have to take the source-confusion 
problem into account and find a way to solve or circumvent it. 

Even if a signal foreground can be resolved, the estimated signal waveforms will deviate 
somewhat from the true signals due to instrumental or confusion noise. If the estimated 
waveforms are subtracted from the data, then a residual signal spectrum, the subtraction 
noise, remains. Recently, a method was proposed to remove the residual foreground under 
certain conditions This method is based on a geometrical interpretation of signal anal- 
ysis. It allows to access a weak target like a stochastic GW background, irrespective of the 
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fact that the residual foreground - in that case resulting from inaccurate fitting of wave- 
forms from binary neutron stars (NS) and black holes (BH) - may be much stronger. The 
conditions which have to be fulfilled are that (1) an accurate model exists for the waveform 
of individual foreground signals, (2) the overlap between foreground and background signals 
is negligible or irrelevant, (3) there are not too many foreground sources compared to the 
amount of data being collected (this will be specified in section IVl|l and (4) the data is taken 
with a network consisting of at least 2 detectors. If the second condition is not fulfilled, then 
the removal of subtraction noise may deteriorate the waveforms of background signals in 
the data. The noise-removal algorithm which is geometrically defined as a noise projection, 
comes with many numerical challenges which could not be addressed in 12|. The purpose 
of this paper is to present a detailed discussion of the noise projection and to show how it 
can be implemented into a data analysis pipeline of a simulated future-generation detector 
network. 

The network model of our simulation is based on a design draft for a future mission, 
the Big Bang Observer (BBO) [l^. Its primary target is to measure the stochastic GW 
background with cosmological origin (CGWB) which was generated shortly after the big 
bang presumably during the inflationary phase [isl, [l^. For non-exotic (likely) models 
of the CGWB, the detector should be designed with peak sensitivity at lowest possible 
frequencies, since sensitivity towards stochastic backgrounds increases with decreasing signal 
frequency. This background will be overlayed at all frequencies by a foreground of CBCs 
which needs to be subtracted. At this point, one has to take into account the confusion 
noise problem. The galactic WD/WD foreground poses an intractable barrier even for 
future detectors. By consequence, its spectrum which reaches out to 0.25 Hz [17|, sets a 
lower boundary on BBO's detection band. Lower frequencies beyond the WD/WD barrier 
are excluded by a foreground of merging supermassiv black holes and too less data would be 
collected at these frequencies. Above 0.25 Hz, a remaining foreground of 10^ - 10^ NS and BH 
binaries has to be subtracted from the data. For BBO, the NS/NS mergers are the weakest 
foreground signals, and they are the most difficult to analyze and to subtract. Estimates 
for the number of merging NS/NS are highly uncertain. Extrapolating predictions of the 
galactic merger rate to the whole observable universe, one obtains values around 10^ for the 
NS/NS mergers per year As was explicitly shown in BBO is sensitive to 

virtually all NS and BH CBCs in the entire observable universe! Not surprisingly, detection 
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and analysis of CBCs build the secondary target of BBO. 

At an early stage of creating the simulation, it became clear that it would not be possible 
to demonstrate the projection method on a realistic foreground with 10^ or more events even 
if the search for the signals was excluded from the pipeline. Therefore, we chose to test the 
algorithm on a much smaller foreground consisting of 100 injected NS/NS systems. And 
even then it was necessary to shorten the observation time from the mission lifetime of 3 yrs 
down to 10^ s. In the end, our results have to be extrapolated to the full observation time 
of 10* s in order to derive a prediction of BBO's sensitivity towards the CGWB. 

Our paper is organized as follows. In section [Tll we describe in detail the fully-dynamical 
model of the detector network which underlies the simulation. In section IIIH we give an 
overview of the simulation pipeline and highlight that the network design of BBO is tightly 
linked to the demands of the data analysis pipeline. A brief description of the signal model 
which determines the CBC waveforms and an introduction to the Fisher matrix which is one 
of the basic quantities of the projection method are given in section IIVI In section |Vl we 
present a general framework how to simulate a stochastic signal in a network of space-borne 
detectors. The geometrical interpretation of statistics is outlined in section IVII including 
a description of the subtraction-noise projection. The optimal cross-correlation scheme for 
BBO is explained and investigated in section IVlTl Results are given in section IVml together 
with an extrapolation of BBO's sensitivity to an observation time which is equal to BBO's 
lifetime. 



II. THE NETWORK MODEL 

BBO consists of four independent detectors which orbit the Sun at 1 AU. Each detector 
is formed by three spacecrafts in a nearly equilateral triangular configuration (Fig. [T]) . The 
nominal distance between spacecrafts is 50000 km which entails that they follow slightly 
eccentric orbits with e ~ 9.65-10"^. Each detector performs a cartwheel motion on the orbital 
path completing one rotation in one year. All triangles are tilted against the orbital plane by 
60°. In addition, the relative distances between spacecrafts change by small amounts (0.01%- 
0.02%) during one year - the so-called breathing motion - and therefore the detectors cannot 
be treated as rigid objects. The motion of each detector can be described in a power series 



of the orbital eccentricity e 21| . Expanding the exact orbital equations (which can be found 
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in 



22I ]) up to second order, the position vectors read 



\ / 



^1 cos(2ai(t) - pij) - I cos(/?ij)^ 
+ e ■ AU ■ i sin(2ai(t) - A,) - | sin(Ai) 
\ -y/3cos{ai{t) - Pij) J 

^1 cos(3ai(t) - 2pij) - I cos(ai(t) - 2^^) - | cos(ai(t))^ 
+ ■ AU ■ I sin(3a,(t) - 2/3,,) + | sin(a,(t) - 2/3,,) - f sin(a,(t)) 
\ -1^3 [cos(2a,(t) - 2f3ij) - 3] / 



(1) 



where the first-order correction of the orbital path corresponds to the detector's cartwheel 
motion and the second-order correction describes the small relative motion of the spacecrafts. 
The angle ai{t) = 27r/orb^ + ^,(0) with /orb = 1/yr determines the location of the detectors 
i = 1, ... ,4 on the orbit of the Earth and (3ij = 2(j — l)7r/3 + fixes the position of 
spacecrafts j = 1,2,3 in each detector i. The constants govern the relative cartwheel 
phases of detectors. In the following we will identify a spacecraft ij by a single index j 
assuming that all formulas are independently valid for each detector. The initial detector 
positions are a(0) = (0, 0, 27r/3, 47r/3) and the internal configuration for each detector is 
given by ^ = (0,7r, 0,0). A vr difference of the first two components of ^ puts the first 
two collocated detectors in a star-of-David configuration. Otherwise, components can be 
chosen arbitrarily. A BBO spacecraft will certainly have a different design than a LISA 



spacecraft. However, it should be clear that the optimal sensitivity which is quantum-noise 




FIG. 1: The BBO network of LISA-type detectors 
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limited at high frequencies and acceleration-noise limited at low frequencies, does not depend 
on the topologies of the optical benches. Therefore, we assume a LlSA-type optical-bench 
design of the BBO spacecrafts and make use of well-known LISA results to evaluate BBO's 
instrumental noise and GW response. For LISA, a minimum number of four photocurrents 
per spacecraft has to be included in a detector simulation. Two of them measure frequency 
fluctuations yi of the light coming from a neighboring spacecraft via detector arm / and the 
other two measure intra-spacecraft signals which are denoted by zi where the photodiode 
(which records) Zi is found on the same optical bench than the diode yi (there are two optical 
benches, one for each link to a neighboring spacecraft). The inter-spacecraft signals yi are 
sensitive to GWs. The link index / assumes positive and negative values to discriminate 
between the two light-travelling directions fii of the detector arm (see Fig. [2]). Now, the 
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FIG. 2: Triangular BBO detector configuration 

noise spectrum of each photocurrent will be dominated by laser-frequency fluctuations and 
optical-bench noise 23|]. The situation is different for ground-based detectors where laser 
noise interferes destructively at a beam splitter towards the output port and suspension 
systems and isolation schemes attenuate the equivalent of optical-bench noise. The solution 
is to establish destructive interference electronically by appropriately adding and subtracting 
photocurrents in each detector. Some photocurrents have to be added to others with certain 
time delays 

DdVi = yi,dit) = yi{t- Ld{t)/c) 
Dd^Dd.yi = yi-d^diit) (2) 
= yi{t- Ld^ {t)/c- Ld^ {t - Ld^ {t)/c) I c) 
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where Ld{t) is the length of the optical path of link d which was travelled by light being 
detected at time t, and c is the speed of light. That is why the electronic interference scheme 
is known as time-delay interferometry (TDI). We mention that one has to take into account 
that the light propagation directions ni{t), —n_i{t) differ predominantly due to the detector's 
cartwheel motion. Also, the relative spacecraft velocities lead to minor corrections of the 
TDI combinations predominantly through relations of the form 

V {t + Ld, it) /c) ^ La, (t) + Ld, (t) ■ Ld, it) /c (3) 

with typical relative speeds 10 ■ e^AU ■ /orb ~ 5 ■ 10~^ m/s (a smaller contribution comes 



from a term which is proportional to e^AU^/Qj.jj/c, see appendix A in 2J] for details). The 
assumption for Eq. ([3]) is that the distance between spacecrafts does not change much during 
a light travelling time L/c. Henceforth, to make our descriptions more readable, we will not 
make explicit reference to the optical-bench noise. It should be automatically included into 
the argument whenever we mention laser frequency noise. Algebraically, it is always possible 
to treat optical-bench noise effectively as additional laser noise. 



Previous investigations led to the introduction of three generations o 



TDI combina- 



tions which cancel laser frequency noise based on various assumptions [2j, |25|]. The first- 
generation combinations are defined to cancel laser noise of a detector which does not have 
the cartwheel or relative spacecraft motion. If they were used to analyze realistic data, then 
residual laser noise would contribute to the total instrumental noise. The same is true for 
the modified first- generation variables which are based on the assumption that the detector 
is a rigid object which may perform the cartwheel motion. However, residual laser noise 
will be much weaker in this case, since it is exclusively caused by the relative motion of 
spacecrafts which is a second-order effect in terms of the orbital eccentricity e. Finally, the 
second- generation combinations take relative motions of the spacecrafts into account and 
cancel noise contributions which depend linearly on L^/c oc e^. We claim that choosing 
second-generation instead of modified first-generation TDI variables has a much weaker in- 
fluence in the case of BBO than for LISA. The reason is that the relative motion of BBO 
spacecrafts compared to LISA spacecrafts is a factor 10^ smaller. Investigating residual laser 
noise spectra in modified first-generation and second-generation combinations of BBO will 
show which generation has to be implemented. However, at least for the purpose of this 
paper, we just need to introduce the TDI combinations in their modified first-generation 
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form. 

In order to obtain a concise expression of the TDI combinations, we define new Doppler 
variables where a certain combination of intra-spacecraft hnks z is added to inter-spacecraft 
hnks y 

y'i = yi (4) 

In terms of these quantities, the laser-noise-free TDI combination Xi can be cast into the 
form 



/ / / / ^' 

- [^2,-2-33 + 'y-2,-33 + ?/-3,3 + ^3] 

Time delays commute in first-generation variables and therefore, semicolons in Eq. ([2]) have 
been substituted by commas. TDI Xi mimics an unequal-arm Michelson interferometer 
centered at spacecraft 1. Cyclic permutation of all indices leads to the definition of X2 
and X3 which represent interferometers centered at spacecrafts 2 and 3. Each of the two 
square brackets in Eq. (JSj) comprise terms which represent a complete round trip of light 
in clockwise and counter-clockwise direction. These two beams are then subtracted from 
each other to form the unequal-arm Michelson which can be represented geometrically as 
shown in Fig. [3l The instrumental noise of the three channels X^ is correlated. It is more 




^1 Q- 



FIG. 3: Graphical representation of the unequal-arm Michelson TDI combination Xi [271]. This 
combination of photo currents mimics the subtraction of two counter-propagating beams. 

convenient to use channels with uncorrelated instrumental noise, especially if information 
from all channels is combined to provide optimal sensitivity with respect to GWs. These 
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channels are known by the names A, T and can be defined in terms of the basis vectors 

A = ^ 

„ Xi — '2X0 ~\~ X'l , . 

7^ 

Each of these variables can be seen as one detector and so in principle, each LlSA-type 
detector has to be treated as a network which consists of three independent detectors. It 
turns out that these channels have quite different sensitivities to GWs and also, correlation 
measurements between them does not yield the same profit than what one may naively 
expect from a detector network. We will come back to this in a later section. 



III. OVERVIEW OF THE SIMULATION 



The simulation is organized according to the pipeline shown in Fig. HI The first step 
is to generate time series for the various Doppler streams (12 per detector, 48 in total). 
These data contain the instrumental noise and contributions from 100 CBCs. It is fairly 
simple to derive time domain models of the test-mass noise {Stm oc 1//^ in units of Doppler 
shift) and the shot noise (S'shot oc p) \2^. For the GW signal, we use a time-domain post- 
Newtonian (pN) approximation (Eq. ([8])). The data will depend on spacecraft motion and 
is generated consistently throughout the network by evaluating the GW phase at retarded 
time (Eq. fllip ). In contrast, the CGWB has to be generated directly in the frequency 
domain (Eq. fl^T]) ). Assuming a Gaussian model, the frequency-domain representation of 
the CGWB is completely determined by a function called the overlap-reduction function 
which essentially characterizes correlations between different output channels of the detector 
network (see section IV B|) . 

The second step is to search the data for the CBCs. For that purpose, one has to exploit 
features of the network in a distinct order. The CBCs are analyzed by coherently combining 
the data of all 12 independent channels. The detector arrangement significantly improves 
parameter estimation of signals which cannot be integrated over long enough times. Any 
poorly fitted broad-band signal could have a devastating effect on the mission goal: it 
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FIG. 4: (Color online) FM: Fisher Matrix, TS: True Signal, BF: Best-Fit 

may be that the respective residual noise even after applying the noise projection method 
is stronger than the CGWB spectrum. Initially, we implemented a Markov-chain Monte- 
Carlo (MCMC) algorithm which searched for the maximum of the posterior distribution 



29|, 



301 ]. However, our computational 



determined by our signal models and simulated data 
resources were not sufficient to perform a realistic search. Therefore, we decided to calculate 
the best-fit parameter values. This is a trick (the "Cheat" box in Fig. |4]) to avoid the CBC 
analysis. The idea is to calculate the noise vector on the template manifold which points 
from the true signal to the best fit. Vectors are defined in tangent spaces, so the best fit has 
to lie in close proximity to the true signal (high signal-to-noise ratio). Further details can 
be found in sections IIVBI and IVIBI The best-fit waveforms are subtracted from the data 
which then consists of instrumental noise, the CGWB and the subtraction noise. 

The third step is to carry out the projection method to remove the subtraction noise. 
The projection operator is defined in section IVI C[ This step is required for the following 
reason. A final correlation measurement of data streams of the two collocated detectors is 
supposed to lift the CGWB which is correlated to some degree in different channels above 
any other contribution to the data. Now, this only works if correlations between channels 
of the instrumental noise and the subtraction noise can be neglected. This is true for 
the instrumental noise, but it is not for the subtraction noise which is highly correlated. 
Remember, the subtraction noise corresponds to the difference of the true signal and the 
estimated signal which is a single quantity for the whole network (modulo detector transfer 
functions). 

Finally, as described in section IVIH we use cross-correlation results to obtain a signal-to- 
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noise ratio (SNR) for the CGWB with a given energy density. Knowing how the SNR scales 
with observation time, we derive a prediction for BBO's sensitivity towards the CGWB 
based on its full mission lifetime. 



IV. THE FISHER MATRIX IN A TDI FRAMEWORK 
A. The signal model and its derivatives 



ysis. 



33|. 



Since decades, people have been developing a geometrical interpretation of data a na. 
These models usually consist of a distribution carried by a certain model manifold 31 
If one considers Gaussian distributions, the metric of the statistical manifold is given by 
the Fisher- information matrix F^^. To calculate it, one needs a noise model and a signal 
model. The model T™ which determines the signal inside a TDI combination T depends on 
parameters A°. Concerning the noise model, one assumes complete knowledge of its (double- 
sided) spectral density S''^. In practice, the model for the noise spectral density itself would 
depend on a few parameters which would have to be estimated before searching the data for 
certain signals. 

In general, the Fisher matrix F^^ is associated with a TDI variable T and it is defined as 
a scalar product of derivatives of the signal model T™ with respect to the model parameters 
A. Defining <9q,T™ = dT"^/dX°', the Fisher matrix assumes the form 

Re(9„r-(/)9^t-*(/)) (7) 



= 2 d/ 







The model which determines a single CBC signal inside our simulation depends on 5 extrinsic 
parameters (A" = luminosity distance r of the source to the detector, A^ = declination 
6, A^ = right ascension 0, A"^ = polarization angle ip, A^ = inclination angle l of binary 
orbit with respect to line-of-sight) and 4 intrinsic parameters (A^ = orbital phase (pc, A^ = 
coalescence time tc, A'' = total mass M = Mi + M2 of the binary system, A^ = reduced 
mass n = M1M2/ {Ml + M2)). In other words, we neglect the spin of the two binaries and 
assume zero eccentricity of their orbit. If two signals have to be parameterized, then A^ 
would be the distance parameter of the second signal etc. A simulation is completely based 
on signal models (there is no true data). Still, we have to distinguish between the model 
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for the simulated signal T*^ which adds to the noise T" to form the total data T*^, and the 
model T™ which is used to analyze the data. In general, one may choose different models 
to generate and analyze data. Once generated, is not considered as a function which 
depends on parameters, but as a set of fixed numbers. Here, we use the same model to 
generate and analyze data, and therefore the next paragraph gives a description of both. 

Since Eq. ([5]) tells us that a TDI variable is a linear combination of the Doppler signals Ui , 
derivatives of a TDI variable with respect to certain model parameters can be expressed as 
a sum of derivatives of our signal model |/™ for a single Doppler signal. Therefore, it suffices 
to calculate and present the derivatives d^yf^. The Doppler signal is a projection of the 
GW tensor onto the light-travelling direction fii. In transverse-traceless coordinates, the 
matrix representation of the GW tensor contains the two GW polarizations h^, which 
are functions of the distance r and all intrinsic parameters: 



ft+(() 




'1 + COS^(i)) ■ COs(0(t) + 0c 



2 cos(i) ■ sin(0(t) + 0c 



(8) 



We implement the restricted waveform which neglects all harmonics higher than twice the 
orbital frequency and whose amplitude is determined by the chirp mass Aic = GM'^^^ jj?/^ / c^. 
The evolution of the GW phase is given by a 3.5 post-Newtonian expansion 

^ 7 
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k=0 



PkT 



(5-fc)/8 



(9) 



with r = {rjc^itc - t))/{5GM) and expansion coefficients [3J] 
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which are most suitably expressed in terms of the symmetric mass ratio i] = fi/M, and 
C = 0.57721566 ... is Euler's constant. In total, the GW phase depends on the mass 
parameters M, fi and the chirp time tc- At some point we had to choose a convenient mass 
parametrization. Obviously, it would have been possible to use r] instead of /i, and indeed 
in many situations this could be a good choice. However, for comparable mass binaries like 
neutron-star binary systems (which is the only kind of signal we included in our simulation), 
the mass ratio t] has the odd property to be close to its maximum value ?7max = 0.25 which 
holds for equal-mass binaries. By consequence, in our case, probability distributions for r) 
will not be Gaussian and the distribution of other parameters may also exhibit non-Gaussian 
features through parameter correlations. So, without further investigations we decided to 
use the reduced mass fi as second mass parameter. As we will show later, the distributions 
for M and fi are highly correlated even for strong signals which complicates the calculation 
of the inverse of the Fisher matrix. It would be interesting to investigate whether a different 
mass parametrization behaved better in this respect. 

Now, projecting the GW tensor, we arrive at the following form for the Doppler signal 
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|2l|: 

2{1 -k{9,^)-ni{t)) 

J2 e/(^, 0, V^) ■ hj{t, - k{e, 0) ■ f,{Q/c) 

I=+,x (11) 
/=+,x ^ 

The Doppler signal is the GW induced frequency change of hght which is sent at ts = 
t — Li{t)/c from a spacecraft at position r's(ts) and received at time t by its neighbor at 
position r'r(t). "T" denotes a transposition. The two polarization matrices 6+, Gx are 
derived from their simple form in the GW propagation frame by a rotation 'D(6', 0, il)) into 
the solar barycentric frame whose coordinates are used to describe the spacecraft positions. 
The rotation matrix is shown explicitly in the appendix of 2^. The propagation direction 
of the GW is given hy k = — (cos(6') cos(</)), cos(^) sin(</)), sin(6')). To calculate the Fisher 
matrix, we generate time series of derivatives 9^T™ with T G {A, T} and subsequently 
apply a fast-Fourier transform (FFT) to obtain the amplitudes which govern the Fisher 
matrix components. Taking derivatives with respect to r and 0c is trivial. The same is 
true for all other extrinsic parameters, although the result is rather complicated due to the 
waveform's dependence on the angles 6', 0, il) (see Eq. ( fTTl) ). However, we think that it is 
instructive to present derivatives of the GW phase, since for all three non-zero derivatives 
of the phase, the result can be cast into a form which resembles the pN expansion Eq. ([9]) 
of the phase. The corresponding expressions can be found in appendix lAl 

Finally, one has to specify models for the noise spectral densities S^^{T). The spectral 
densities for the uncorrelated channels read 26 1 

= 16sin2(27r/L/c) 

■ (3 + 2 cos(27r/L/c) + cos(47r/L/c))5*"' 

(12) 

+ 8 sin2(27r/L/c)(2 + cos(27r/L/c))^^'^°* 



5^(r) = 128sin2(27r/L/c) sin^(7r/L/c)^*'^ 

+ 16(1 - cos(27r/L/c)) sin2(27r/L/c)^'^°* 
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with test-mass noise S**™ = S'^^'^/(27r/c)^ and shot noise S^^°^ = ^o/-Prcc(2vr//c<Jo)^ in terms 
of double-sided spectral densities. The standard design of BBO provides a spectral density 
of test-mass acceleration S^'^'^ = 9 ■ 10"^^ m^/s^/Hz which in our simulation is assumed to be 
equal for all test masses and light power Prcc = 9 W which is received by a spacecraft from 
one of its neighbors. The carrier frequency of the laser is ujq = 5.31 ■ 10^^ s~^. It is sufficient 
to express the noise models in terms of the nominal arm length L = 50000 km, because 
estimation errors of the noise will probably exceed the systematic errors due to implementing 
a simplified model. This is certainly true in our simulation where an estimation of the low- 
frequency test-mass noise spectrum would be based on a few frequency bins. In contrast, 
the simulated noise T° is based on a combination of individual Doppler signals which then 
depends on detector motion and asymmetry. 



B. Numerical evaluation of Fisher matrices 



The complete simulated CBC foreground is composed of 100 NS/NS systems which oc- 
cupy frequencies between 52mHz and 2.2Hz. Restricted by computational power of single 
notebooks (a cluster version of the code is being developed), we could simulate data with an 
observation time T = 10^ s and a sampling frequency of fg = 5.24288 Hz which essentially 
fix the frequency range of the injected binaries. Parameter values for tc, r and the two 
mass parameters of the CBCs are drawn from non-uniform priors. Values for M and fi are 
derived from normal distributions of the individual masses Mi, M2 which are centered at 
IAMq, distance values are restricted to yield sensible signal-to-noise ratios and values for 
the chirp time are determined by assuming a certain distribution of signals over frequency 
bins. Assuming a Newtonian evolution of the orbital frequency of the binaries, the number 
of signals per frequency bin has to obey a distribution A^(/) oc 1/ /^^^^ near BBO frequencies 
35(1 • However, in our simulation, we draw initial frequencies from a N{f) oc 1// distribution 
which yields a few systems at higher frequencies, but otherwise has no significant effect on 
our analysis. The frequency distribution of the 100 NS/NS signals is displayed in Fig. [51 
The reason for taking greater care in frequency priors is that the signal distribution in fre- 
quency space has a significant impact on correlation values between parameter distributions 
of different CBCs especially since the sky resolution of the detector network is comparatively 
poor for short observation times. The systems with highest frequencies have chirp times tc 
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FIG. 5: (Color online) Initial distribution of NS/NS signals over frequency bins. 

which are of the order of a few T and therefore the signal spectrum which is shown in Fig. O 
exhibits multiple quasi- monochromatic peaks at low frequencies and a few chirp "plateaux" 
at higher frequencies. 
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FIG. 6: Signal spectrum of the A channel of the first detector. At low frequencies up to 0.7 Hz, 
the spectrum features distinguishable, mildly chirping signals. At high frequencies, many signals 
overlap to form a plateau of chirps. The signals' merger times were chosen such that, within 
T = 10^ s, signal frequencies never become greater than 2.6 Hz, which is half the sampling frequency. 
Note that at frequencies above 0.5 Hz, just every 20th frequency bin is plotted to reduce figure size. 

Before being able to numerically evaluate the Fisher matrix, one has to search the simu- 
lated data for the injected binaries and estimate their parameter values. The signal deriva- 
tives in Eq. ([7]) have to be evaluated at the estimated parameter values A. Implementing 
uniform priors of parameter distributions at this point, an optimal analysis is performed 
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by searching the hkehhood function C{X) oc exp(-l/2 ^.(T;*^ - T^{X)\Ti' - T^{X))) for 
its gfobal maximum 36|, l37|]. Here the sum has to be taken over all independent network 
channels (the BBO network furnishes 12 independent channels, 3 per detector). As was 
argued in [l^, no existing computer or network of computers could accomplish that search 
for a realistic foreground formed by 10^ — 10^ CBCs. Even searching simulated BBO data 
for 100 signals including the estimation of parameters is a difficult task which optimally 
requires a high-end cluster. Our work is not intended to make any propositions how to 
perform that search let alone to carry it out. So we have to work around the problem. The 
idea is that knowing the realization of the instrumental noise in a simulation run - which 
we do, since we generate the noise T"^ and add it to the signals - and assuming Gaussian 
distributions for the signal parameters with small estimation errors (high SNR), one can 
use the following equation to calculate the estimation errors (the difference between the 
maximum likelihood values A" and the true parameter values A" of the signals) 
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5A" = r"^'5^(7:-|9,7:-) (13) 



Again, the sum has to be taken over all independent channels % of the detector network. The 
parameter errors depend on the inverse of the network Fisher matrix (F"^) = C^i^ap)"^ 
with F^^ = {da%^\dj3T/^) . The parameter estimation errors are added to the true parameter 
values of the signals injected into the simulation pipeline, and the Fisher matrix can finally 
be evaluated. A brief introduction into the geometric interpretation of the Fisher matrix and 
how to make use of it can be found in section |Vl] which also explains Eq. ( 1131) . The reader 
may have worked with a close relative of Eq. ( fT3l) in another context. It is a gene ralization 



of the jF-statistic equation to obtain best fits of its 4 amplitude parameters 39|]. The JF- 
statistic is based on templates which are linearized with respect to r, l, ip and 0c- It is 
straightforward to show that if one substitutes the complete data for the noise T^^ in 
Eq. ( IT3|) . then the equation directly yields the best fit of any parameter which enters linearly 
into the definition of the template. Our model does not have linear parameters, but many 
alternative template models do have. 

Notice that by calculating the best fits, we neglect the detection problem, i.e. we assume 
that all binaries in the data are detected. We should also mention that our method to 
calculate the best fits represents an optimal analysis scheme. The optimal scheme is to 
search simultaneously for all signals. A more realistic search which requires much less 
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computational power is the hierarchical search. There, one detects signals one by one, 
starting with highest SNR and "digging" down the signal spectrum until the last binary 
is identified and measured. Unresolved binaries act as confusion noise. Identified binaries 
are subtracted from the data so that during the hierarchical search, parameters of already 
detected binaries are constantly refined as the confusion noise decreases. This scheme has 
been studied by means of a self-consistent recursive evaluation in 12|. In contrast, the 
optimal search is not corrupted by confusion noise. Correlations between different signals, 
which lead to confusion noise in the hierarchical search, are incorporated into the signal 
model of the optimal search. The only possible shortcoming of an optimal search is that it 
may fail to accurately estimate parameter values of the model (including to find the right 
template- manifold dimension which depends on the number of detected signals). 



C. Multi-signal templates and Fisher-matrix inversion 

Given 100 signals which each depend on 9 parameters, the total Fisher matrix F^^ for 
each channel i becomes a 900x900 matrix. It turns out that inverting the Fisher matrix 
is a highly nontrivial task. In our simulation, inverse Fisher matrices are used in two 
different ways. First, we need the network matrix evaluated at the true parameter values 
to compute the estimation errors by means of Eq. f[T^ . Second, the inverse Fisher matrices 
of individual channels evaluated at the estimated parameter values have to be computed 
to define the subtraction-noise projector in Eq. (l42l) . To start with, we outline a generic 
inversion scheme which, in the end, does not solve all problems. However, this method still 
forms the foundation of the complete solution. In the next part of this section, we omit the 
channel index i, since the described method is used in the same way to invert channel and 
network Fisher matrices. 

The inversion procedure starts with the computation of a new matrix F^^ = 
^0/3/ ^/^aa^f3i3 ^uch that F^^ = 1 and all off-diagonal components have an absolute value 
smaller than one. This step is necessary since in our simulation the numerical range of 
Fisher-matrix components is 10~^° — 10^. By consequence, ratios of different eigenvalues 
of the Fisher matrix can assume large values. Such a matrix is called ill-conditioned and is 
known to be hard to invert numerically, because tiny inaccuracies of a few matrix compo- 
nents may have a great effect on the eigenvalues or the components of the inverted matrix. 
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These inaccuracies are unavoidable due to limited machine precision. It turns out that F^^ 
is still ill-conditioned and cannot be inverted using standard double-precision variables. At 
this point, one has to implement a multi-precision package into the code. We found that 
CLN (Class Library for Numbers) 40|] provides all required functions. Using a 50-digit pre- 
cision, the scaled matrix can be inverted following its LU decomposition. Next, the inverted 
matrix is scaled back to form the inverted Fisher matrix F"^ = F'°^/ ^T^^T^^. We expect 
that the degree of ill-conditioning decreases significantly once much longer observation times 
can be simulated. To explain this, we need to have a look on the correlation matrix of a 
single binary. The correlation matrix is derived from the covariance matrix F"^ in the same 
way than F^^ was derived from F^^. As can be seen in Tab. [H correlation is especially strong 
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TABLE I: Network correlation matrix for a single NS/NS at 0.57 Hz. Some correlation coefficients 
strongly depend on the observation time which is T = 10^ s in this case. 



between r ^ t, (p^ ^ tc and M ^ fi. Strong correlations indicate that by changing one pa- 
rameter from its best-fit value, the loss in accuracy of the waveform fit can be compensated 
by changing the value of the other parameter of the correlation pair. Therefore, at first 
sight, it seems to be obvious that these pairs may be strongly correlated. The question is, 
under which circumstances these correlations become weaker. The pair M ^ fi decorrelates 
once a considerable amount of the chirp is observed and the signal-frequency change accel- 
erates. It is well-known that the low-frequency evolution of CBC waveforms is completely 
determined by a single mass parameter, the binary's chirp mass Aic- For those waveforms. 
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implementing a model which needs two mass parameters must exhibit maximal correlations 
between them. A similar argument can be invoked for the pair (pc ^ tc- By consequence, 
correlation matrices of signals with higher frequencies have lower correlation coefficients for 
these pairs. A decorrelation of r ^ t (and weakening of many other correlation coefficients) 
is observed as soon as the orbital motion of the detectors leads to measurable amplitude 
and phase modulations of the signal. With maximal observation times of T = 10^ s, we are 
not able to study the impact of the Doppler shift on parameter estimations. However, as 
we are not particularly interested in the quality of best fits, but accept any quality as long 
as a projection of subtraction noise can be carried out successfully, we do not investigate 
parameter correlations further in this paper. It turns out that the best fits are accurate 
enough for this purpose. 

As mentioned in the beginning, the inversion algorithm as presented in the last paragraph 
does not provide a complete solution of the inversion problem. The reason is that a 900x900 
matrix determined by multi-precision components needs too much memory and even if it 
can be kept in memory during runtime (e.g. by implementing specifically designed inversion 
schemes 41|), then the inversion would take too much time. To solve this problem, we have 
to understand a little more about Fisher matrices. Consider a matrix which includes 
copies of single CBC templates. Each CBC is determined by P parameters. In our case, the 
number of templates is A^ = 100 and the number of parameters is P = 9. Let us introduce 
the "confused" Fisher matrix 



fg^ ... ^ 









N 



(14) 



yo ... g'^^j 



It is a block matrix which contains the Fisher matrices g^^ with G {1, . . . , P} of A^ 
CBCs on its diagonal. In other words, it differs from the total Fisher matrix by neglecting 
correlations between different signals. We call it confused, because whenever this matrix 
is applied instead of the total matrix, it is like our knowledge of correlations between dif- 
ferent signals is ignored. Correlation coefficients become random variables in the analysis 
pipeline leading to confusion noise. We claim that it is legitimate to use the block matrix 
when calculating estimation errors by means of Eq. 0131) . To support this claim, one has to 
investigate the impact of the correlation coefficients on the eigenvalues of Fisher matrices 
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of individual CBCs. Fisher matrices (their inverses to be precise) define a multi-variate 
Gaussian distribution in parameter space. Their eigenvalues correspond to the variances of 
the distribution along its principal axes. The question is what happens to the distribution 
defined by a matrix g^^ when the correlations between signal k and other signals are incor- 
porated into the model. This problem can be treated with perturbation theory similar to 
perturbations of a Hamiltonian (here, T^) which is weakly perturbed by interactions 



c 



gN-l,N 



(15) 



where (7^^ are the correlation coefficients between signals i,j G {1, . . . , A^} and "T" denotes 
the transposition of a matrix. The proper condition to justify the perturbation approach 
is that correlations g^^ have to be small compared to differences of eigenvalues of g^ and 
g^ , which is the case for any combination of signals in our simulation. For this particular 
form of perturbation C, theory tells us that the Ith eigenvalue of the Fisher matrix g^ 
(/ G {1, . . . , P}) is perturbed at second order in C according to 

^ ^ \/kP\n'''\ii^\\'^ 

4,=C + EE ' ' (16) 

ij^k j=i 

where \kl^) are the P eigenvectors of the Fisher matrix g^ with eigenvalues The pertur- 
bation of the eigenvectors reads 

\kl) = Ikf) + E E ,o _ o ^ ■ + OiC^) (17) 

i^k j=i 

Therefore, up to first order in C, one can say that the multi-variate Gaussian does not 
change the lengths of its major axes. Instead, the distribution is rotated and the small 
rotation angles are given by the fraction in Eq. f|T7|) . This means that, when using the block 
matrix F'' instead of the total Fisher matrix F, the coordinate basis |(9^7^™) in Eq. ( |T3l) is 
misaligned with respect to the inverted Fisher matrix (F°)~^, and that the parameter errors 
lie in false "directions", but giving rise to a comparable accuracy of the waveform fit 
up to order 0{C). That is the reason why we may use the block matrix at this point. 
The projection method described later does not depend on these rotations of the parameter 
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space. The benefit is that we can easily invert the confused Fisher matrix by inverting 
Fisher matrices of each signal. In principle, we could even correct the misalignment by 
rotating the inverted matrix with rotation angles {kl'^\gij\'i'j'^) / {^ki ~ ^ij) provided that we 
also calculate the eigenvalues and eigenvectors of all Fisher matrices g^. As a corollary, we 
add that correlations between different signals always lead to a loss of Fisher information 
represented by the determinant of the Fisher matrix 

/ N N P |/M0|„fciL-„-0\|2\ 

det F = det ro 1 - 5: 5: 5: ^' + 0{C^) (18) 

V k ijtk l,j=l 'ikl'iij J 

The second term in round brackets is always positive and well defined, because Fisher 
matrices are positive definite. So, the decrease of the determinant is a second-order effect in 
correlation coefficients which is further suppressed by the Fisher information of particular 
template parameters (i.e. the respective eigenvalue and therefore, especially in the high 
SNR regime, one may neglect information loss. 

Unfortunately, we cannot make use of the same simplification when dealing with Fisher 
matrices which define the projection operator. There, directions reflect the amount of cor- 
relations between the template derivatives and actual subtraction noise in the data. These 
very correlations have to be exploited to facilitate removal of the subtraction noise. Our 
strategy in this case is to reduce the dimension of the template manifold by projecting a sub- 
set of all signals. Namely, we project all signals which possess power in the frequency range 
which contributes most of the SNR to the final correlation measurements of the CGWB. 
More details can be found in section IVIII A[ 

V. THE STOCHASTIC BACKGROUND 

In this section we sketch out how to simulate the CGWB. The stochastic data is generated 
directly in the frequency domain starting with one channel and then taking correlations be- 
tween channels into account to derive data for other channels. The function which describes 
the correlations is called the overlap-reduction function (ORF) 'jabif) between channels a 
and b of the same detector or different detectors. As we are going to learn in section IV Bl 
correlations between channels A, E, T of the same detector are negligible. We introduce a 
new definition for the ORF which does not make any attempt to factor out a channel's trans- 
fer function. This is the most convenient approach based on a dynamical detector model 
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where relative detector and satellite motion has to be taken into account. Henceforth, since 
the T channel does not furnish significant sensitivity with respect to a measurement of the 



CGWB (see 



42| and section IV Bp , it will be excluded from the branch of the pipeline which 



processes the CGWB. 

A. Generation of stochastic backgrounds in detector networks 

A zero-mean Gaussian background is completely characterized by its second-order mo- 
ments, i.e. the auto- and cross-correlations of TDI channels. In our case, we just have 
to include channels of the two collocated detectors in our investigations since correlations 
between other detectors are negligible (correlations fall off rapidly if the distance between 
detectors becomes much larger than the length of the GW). Therefore, based on correlation 
properties of a simulated stochastic background signal in detector channels a and fe, 
notably 

^Re {(7;^/)K'^(/)]*)} = s\f)^am (19) 

we have to find an algorithm to calculate the stochastic TDI signals T^{f) produced by all 
channels of the two collocated detectors. We define the ORF '^ah as a real-valued function, 
which essentially establishes a convention how correlations between channels are evaluated 
(see Eq. (1461) ). The ORF governs the strength of cross correlations (a ^ b) and autocorre- 
lations (a = b) in the frequency domain. Here, the observation time T is used to convert 
amplitude squares into spectral densities and the (double-sided) background spectral density 
S^{f) of the GW amplitude is related to the fractional energy density Q by 



SV) = |f|-^ (20) 



The background is assumed to be isotropic and to have a white energy spectrum with 
fiducial value ^{f) = 10~^^. In our simulation, the value of the Hubble constant is Hq = 
72 km/s/Mpc. 

Now, the idea is to generate a stochastic signal with the correct spectrum in one channel 
and then to proceed with other channels by taking mutual correlations into account. The 
equations used to generate the background are 
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7ii(/) V2V V 7ii(/)/ 

For each frequency, random values ai(/), a2(/), and h2{f) are drawn from a normal 

distribution A/'(0, 1). It is not necessary to extend Eq. ( I2T1) to three or more channels, 
because in the end correlations are evaluated between independent pairs Aq ^ Ai and 
Eq ^ El of the two collocated detectors. In the next section, we show how to obtain the 
overlap- reduction function 'jabif) in a TDI network. 



B. Overlap-reduction function 

As already mentioned, we need a procedure to calculate the overlap-reduction function 
(ORF) between arbitrary detector channels at all frequencies. In previous publications, the 
ORE has been defined by [4^ 

iZV) = ^ [ e2-'/^^^>(Fj-F+ + F^F-). (22) 

Here Ax denotes the separation vector between the detectors, f2 is a unit-length vector 
on the two-sphere S'^ and F^'^ are the response functions of detector a to the + or x 
polarization. This function is normalized such that its value is unity for two coincident, 
aligned Michelson detectors with perpendicular arms. The motivation for this definition 
was to separate the optical properties of the detectors from their geometric properties which 
determine the response functions. The ORE 7^^'^ is used to calculate correlations between 
projected GW amplitudes in two detectors, and then optical transfer functions can be used 
to derive the correlations of the detector outputs. 

There are a few reasons why the ORE in Eq. (122!) cannot be used under general circum- 
stances. Eirst, it assumes that the response functions are constant in time. Obviously, this 
is not the case for space-borne detectors, where test masses which are on individual orbits 
around the Sun move relative to each other. Also, the separation vector between different 
detectors does not have to be constant. Beyond the long- wavelength limit which demands 
that the length of a GW is much larger than the dimension of detectors, it is in any case 
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difficult to agree upon a detector position. In other words, light-travelling times between 
test masses are neglected in 7^^*^. Therefore, we propose a slightly different definition of 
the ORF which is not normalized and which is a direct representation of the correlation 
strength between channels. We will not make any attempts to separate optics and geome- 
try, since they are tightly linked in TDI detectors. The complete detector dynamics can be 
incorporated into a frequency-domain correlation function in the following way: 

1. Inject a time-domain delta signal 6{t, t') = ■ sinc(7r/s(t — t')) (/g being the sampling 
frequency) associated with a polarization and sky direction into each detector of the 
network. Propagate it from a common origin so that relative phase shifts of the GW 
at different detectors are automatically taken into account. Make sure that t' is larger 
than any light travelling times between detectors, otherwise the peak does not appear 
in all data streams. 

2. Record the outputs T^+'^'iO, (p; t) of all TDI channels. 

3. Apply an FFT to the data and thereby obtain the complex- valued transfer functions 
T^^'^{6, (j)] f) of the TDI channels. In our case, the transfer function can be used to 
map GW amplitudes to TDI Doppler outputs in the frequency domain. 

4. Multiply the transfer functions of different channels and average the product over 
many sky directions and polarizations to obtain the ORF. 

In summary, the ORF for a TDI network is defined by 

latif) = (y1 ^e{f-\f,9,^)[f-\f,e,^)r]\ (23) 

\^=+.x / s.a. 

In our case the phase factor ^'^'^^f^^^/^^ arising from the time delay between the detectors is 
already included in the Doppler signal outputs of the TDI. We found that averaging over 
200 random sky directions provides very accurate results. Values for the right ascension 
are drawn from a uniform distribution W(0, 27r) whereas values for the declination 6 are 
obtained by calculating the arcsin of values drawn from a uniform distribution W(— 1,1), 
which entails an isotropic distribution of corresponding sky directions. 

The set of curves displayed in Fig. [7] shows the ORFs between channels A,E and T of 
the two collocated detectors in comparison to the channels' sky-averaged squared transfer 
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FIG. 7: The figure shows the overlap-reduction functions 7 between channels A,E,T of the collo- 
cated detectors compared with the corresponding squared transfer functions. The sky average was 
taken over 400 random sky directions. The squared transfer functions of each channel type are 
identical in both detectors. At frequencies below 0.1 Hz all displayed curves related to channels 
A,E are proportional to the T channel curves are proportional to f^^l The response of the T 
channel to GWs is very poor below 2 Hz. 



The response of the T channel lies well below the response of channels A,E at frequencies 
up to 2 Hz. More specifically, within the correlation band 0.1 - 0.4 Hz (see section IVHp 
of BBO, the T channel response (i.e. expressed as STFs or ORE) is smaller by a factor 
~ 2 ■ 10^(0.4 Hz//)^ than the response of the other two channels. This again is the reason 
why our correlation analysis will not include the T channel. 

In this paper, the ORFs are defined as correlation functions between TDI Doppler chan- 
nels which are - at low frequencies - proportional to the square of the second (e.g. A,E) or 
even higher derivative (e.g. T) of the GW amplitude. To find a better measure of stochastic 
GW background correlations (as projection/combination onto two different TDI channels), 
one has to compare the ORF with the channel responses to GWs. In Fig. [8], the ORF be- 
tween the two A channels of the collocated detectors is shown normalized by the geometrical 
mean of the two respective STFs: 



functions (STF). 
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FIG. 8: The figure displays the normalized ORF between channels ^ of the two collocated 
detectors. The normalized ORF has a maximum at / = OHz and then oscillates around zero 
with constantly decreasing amplitude towards higher frequencies. The ORF as defined in Eq. (j23p 
governs the correlation of TDI outputs. In contrast, the normalized ORF is a better representation 
of the correlation of a stochastic GW signal as input to the detectors. However, one has to keep 
in mind that all frequency-domain functions are obtained via FFT from a dynamical model and 
therefore it is not possible in a simple way to deduce GW correlations from measured TDI outputs. 

At frequencies up to 0.5 Hz, the ORF and the STFs show identical response. This region 
is called the long- wavelength regime where the length of GWs is much larger than the 
dimension of the detectors. Beyond the long- wavelength limit, the ORF becomes weaker 
at higher frequencies compared to the STFs. As we expound in section IVIIl correlation 
measurements with BBO detectors will be dominated by contributions of frequencies in the 
long- wavelength regime. 

The graph in Fig. [9] allows to draw another interesting and important conclusion. It 
shows the normalized ORF between channels A, E of the same detector. Obviously, cross 
correlating A channels of two different (collocated) detectors provides much more sensitivity 
than cross correlating independent channels of the same detector. In fact, one can show that 



if the detectors were equilatera' 



same detector would vanish 
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triangles, then the ORF between channels A and E of the 



451 . Sky-averages of these two channels are orthogonal to 



each other in terms of their response to an isotropic stochastic GW background, but they do 
permit no n- vanishing correlations arising from higher moments of anisotropic backgrounds 
(e.g. the hexadecapole moment). In our simulation, asymmetries in the triangular detector 
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shape are responsible for a residual sky-averaged correlation strength. The detector asym- 
metry is of the order of the orbital eccentricity of the satellites. To make this effect stronger 
than artificial anisotropics resulting from a sky average over a finite number of sky direc- 
tions, we chose a model detector with increased eccentricity value (e = 0.04) to generate 
Fig. O Since BBO has collocated detectors, the residual correlation will not be exploited, 

0.06 I . • • . • . • n 




Frequency [Hz] 

FIG. 9: The figure shows the normalized ORF between channels ^ Eq of the same detector. 
Correlations between ^ Eq are much weaker than correlations Aq ^ ^i. Still the residual 
correlation could lead to an increased sensitivity of a single detector to isotropic stochastic back- 
grounds. Here, the data is based on a model detector with orbital eccentricity e = 0.04 to make 
the effect of a residual response due to asymmetries stronger than artifical anisotropics resulting 
from a sky-average over a finite number of sky directions. 

because in that case one can form more efficient correlation schemes based on channels of 
different detectors. Also, when generating the CGWB data, we may neglect correlations 
between channels A and E which justifies the two-channel approach in Eq. fl2T]) to the BBO 
network. 

In terms of the calculated ORFs 7yioAi(/), IeoeAJ) and STFs TAqAoI/), 7AiAi(/), 
lEoEoif) aiid ■jEiEi (/)) we are able to calculate the stochastic background ]^(/) and i?Q ^(/) 
by means of Eq. fl^Tl) . In Fig. [TUl its spectral density in channel Aq is shown together with 
the instrumental-noise spectral density. One can directly infer from the graph that the SNR 
is about 0.1. A correlation measurement has to raise this value to 5 at least. By simple 
arguments, we can determine the conditions under which the CGWB becomes detectable 
by means of correlation measurements. As a first approximation one can say that the corre- 
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FIG. 10: Simulated CGWB with fractional energy density = 10~^^ in comparison to the instru- 
mental noise. A smaller observation time, T = 2 • 10'^ s, was chosen to generate the curve, but the 
spectra shown are independent of the observation time. The noise model is exclusively used to 
compute Fisher matrices, projection operators and the SNR (see next section). 

lation measurement effectively shifts the CGWB curve in Fig. [10] upwards by a factor VT. 
Adding the SNRs obtained from two independent correlation pairs, we find that an CGWB 
with energy density Q = 10"^^ can be detected if correlation times exceed 10^ s. Section IVlII 
provides more details of the correlation measurement, and we are going to show in section 
IVIIII that our first guess gives the right order of magnitude for the minimal correlation time. 




VI. TEMPLATE-BASED PROJECTIONS OF SUBTRACTION NOISE 



In this section we introduce a differential geometric point of view based in principal on 
the works of S.-I. Amari 321 but adapted to the needs of data analysis of gravitational 



wave signals as is shown in [46|]. The main focus is on presenting methods to deal with the 
inevitably occurring errors when subtracting the best-fit waveforms from the data stream. 
The residual errors have comparable spectral densities than the instrumental noise, because, 
roughly speaking, the subtraction of the best fit reduces the signal spectral density by a factor 
of 1/SNR^. This is true for broad-band and narrow-band signals. Therefore, residual errors 
are too large to allow a measurement of an inflation generated background of gravitational 
waves. Fortunately, though arising from the presence of instrumental noise, the subtraction 
errors are not completely random but mostly confined to the tangent space of the template 
manifold at the point of the best fit. This restriction can be used to define a projection 
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operator on the tangent space that cancels out all parts tangential to the manifold and 



12|. 



hence most parts of the residual error 

In section IVI A[ we give a short introduction to matched filtering and briefly point out 
the connection to differential geometry. Section IVIBI shows the derivation of the first-order 
approximation of the maximum likelihood estimator in case of high signal-to-noise ratio 
which provides important formulas and justifies the use of differential geometry. In the last 
section, we present the actual method of projection. 



A. Matched filtering and differential geometry 

Most of the NS /NS signals observed by BBO will have amplitudes roughly two orders of 
magnitude smaller than the amplitude of the instrumental noise. Therefore some technique 
of filtering must be used to extract the information from the noisy data. Post-Newtonian 
expansions up to order 3.5 of the equations of motions of stellar objects, such as compact 
binary systems, yield very accurate waveforms throughout the BBO detection band, which 
can be employed to search for CBCs in the data streams. The fact that the shape of the 
signals is known to high accuracy is the reason why one can use matched filtering which is 
also known as optimal filtering since it provides the highest signal-to-noise ratio (SNR) of 



all linear filters 
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The detector outputs can, in two ways, be regarded as a vector. The first way which 
will be introduced in this section helps finding an expression for the SNR. In this case the 
detector output is a vector whose components are the outputs of the different detectors 
in different TDI channels. For LISA-type detectors the output vector could be T^{t) = 
{T2{t),T^{t),T^{t)) which is the one-detector case, for BBO the vector has 12 components, 
respectively. In case of a successful detection of a GW, the detector output is the sum of 
the signal Tj^{t) and additive, stationary, Gaussian noise ^"(t), which are vectors in the 
same manner as described above. Optimal filtering of the data stream now means folding 
the output of the detector with a filter function k{t), and normalize it with the correlation 
of the instrumental noise. Henceforth, for ease of notation, we will drop the tilde "~" over 
frequency-domain functions and distinguish between time- and frequency domain functions 
by means of their arguments. Then, the multi-channel SNR can be defined in terms of scalar 
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products between channel vectors: 

nns/d/A(/)-T"(/) 
The optimal filter function k{f) can be cast into the form 

k{f) = [f^iXJ)]HSTHf) (27) 

where T™(A, /) is the Fourier transform of the GW signal parameterized by a set of param- 
eters denoted by A, the dagger means conjugate transpose. is the Fourier transform of 
the noise covariance matrix and its diagonal elements are the (double-sided) power spectral 
densities of the noise in the corresponding TDI channel. Hence it follows that 

J) _ Jd/|f-y-/)|t.(g-r'(/)-n/) _ 

rms/d/|r»(A,/)lt-(S«)-l(/)-r«(/) 

T^(/) the Fourier transform of the detector output, T"(/) the noise in Fourier space. 

In the following we assume as in the previous sections that the noise of different detectors 
and channels is uncorrelated, and that we use the optimal TDI configuration. The optimal 



channels typically called A, E and T [48| are all statistically independent and hence, have 
uncorrelated noise contributions. In this case the correlation matrix of the noise is diagonal 
and with a new optimal filter function k^{X, f) = K™(A, /)]*/ Sf{f) the SNR can be rewritten 
in the form, 

EIdfK{\,f)mf) 

SNR(A) = — ^ (29) 

rmsE/d/fc,:(A,/)^°(/) 

i 

Here Sf{f) is the noise spectral density in the ith optimal TDI channel. The index i runs 
over all detectors and channels. 

However, at this point it is advantageous to regard the data of each channel T^^ as a vector 
itself. Since a detector will sample data at a fixed frequency fg (~ 10 Hz for BBO), each 
data stream will comprise N = fs-T measuring points, where T is the total observation time 
(typically 10^ s which is BBO's lifetime). Each measuring point at time tk = k/ fs can be seen 
as a component of an dimensional vector Tf = {Tf{ti), Tf(t2), . . . , T^(T)) in the vector 
space Vj of all detector outputs of channel i. In the first place, it is this definition of a data 
vector which underlies the geometrical interpretation presented in the following paragraphs, 
not necessarily the gathering of detector outputs into a channel vector. The outputs of all 
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channels form a 12 A^-dimensional vector space V which formally can be thought of as a 
direct sum of the Vj. The instrumental noise and the gravitational wave signal T^'^ are 
vectors in the same manner. Due to the fact that each binary signal is described by a set of 9 
parameters (neglecting spin and eccentricity), the complete signal formed by about 10^ — 10^ 
NS/NS with redshifts z <% will be parameterized by A'^p = 10^ — 10^ parameters and hence 
will lie on a submanifold in V with dimension A^p. 

In case of stationary, Gaussian instrumental noise the matched filter induces an inner 
product on V that is defined by 

oo 

(30) 

* 

It is straightforward to show that the ensemble average of T" ^ ^T"^ | /I^ is equal to 
g\h) for an ensemble of realizations of instrumental noise, which can be proved by using 



T^{f)T'^{f') = S{f — f')S'^{f) (see also 12|, |46|]). In terms of this inner product, the optimal 



signal-to-noise ratio, with help of the above property, can easily be written as 

SNR(A) = (f'"(A) I f'"(A) y (31) 



The inner product defined in equation fl5U]) has the same features than the scalar product in 
Euclidean vector spaces, i.e. it is positive-definite, and can therefore be used as a measure 
of distance and angles within the vector space. The length in Euclidean space corresponds 
to the total SNR Eq. fl3Tl) collected by all channels. Angles quantify the correlation of two 
outputs, e.g. two outputs are orthogonal if their correlation vanishes. Thus the definition 
of an inner product enables one to establish a geometrical description of the problem of 
filtering. 

In the high SNR limit, the best-fit parameters are found by maximizing the like- 
lihood function, see Eq. fl36l) . which is equivalent to minimizing the inner product 
(t^ — T™(A) I — ^^"(A) ^, which can be considered as the distance of the detector output 
to the template manifold. Hence the best-fit template waveform is the one which has least 
separation to the detector output. In other words, geometrically, the best fit corresponds to 
the projection of the output onto the submanifold M, for more details see sections IVI Bl 

and ivrn 

To finish this section it should be said that generally the template manifold is not flat 
but rather has a curvature varying with the values of the parameters of the true signal. Like 
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in general relativity this can be addressed by introducing a metric on the manifold. One 
possible choice for the metric is the covariance matrix of the parameter errors which also is 
the inverse of the Fisher information matrix (see section B), defined by 

/ gf-(A) , gf-(A) \ 
^"^-\^A^'^A^/ ^^^^ 

This statement will not be proven in this paper but the interested reader shall be referred 
to [461. 



B. Maximum likelihood estimator in the high signal-to-noise ratio limit 

In this section we briefly derive the deviation of the best-fit parameters A from the true 
ones Ao in the limit of high SNR, assuming that the signal model is accurate. The outcome 
will give the desired results in terms of the geometrical quantities introduced in the previous 
section. To obtain the results we use the Bayesian estimator and write the exponent of the 
posterior distribution as a Taylor series around the best-fit parameters. For more information 



and different ways of deriving the parameter errors see [38|]. 

First consider the Bayesian estimator of the parameter error that we name e* = A — Aq 
and is defined by, 

(e") = j d^Pee°p(e'|f^) (33) 

The function p{e\T^) is the posterior distribution which determines the probability of a 
model determined by parameter- value deviations e from the true signal, given a measurement 
T^. The connection between posterior and likelihood function is the following 

p{e\r)=Afpoie)p{r\e), (34) 

where AA is a normalization constant, Po{^) comprises the prior knowledge of parameter 
values and p{T^ \e) is the likelihood of data given a model e. At the moment, only flat 
a priori probabilities are put into our simulation which means Po(e ) = 1 and the estimator 
can be written as 

(e") =Ar j d^PeeXf^le-), (35) 

depending only on the likelihood function. In this case the normalization constant is the 
inverse of the integral over the likelihood over the whole parameter space. 

33 



As shown in 36|] , the hkehhood can be expressed with help of Eq. (130|1 as 

p{r\\) oc exp (^-i (f^ - f-(X) I r - f-(A) )^ (36) 

Due to the fact that the data is a sum of instrumental noise and a signal, one can rewrite the 
argument within the brackets of Eq. (1361) as T° + (5T™ where = T™(Ao) — T™(A). Here, 
T™(Ao) represents the true signal in terms of the accurate model. Now, the difference in 
the waveforms can be expanded in a Taylor series as 

- <5f - = a,f-(Ao) e° + ^ 9,9^f-(Ao) e'^e^ + 0(e=^) (37) 
Inserting Eq. (137|) into Eq. (l36l) and introducing normalized waveforms A/"™ = T™/yl and 

/ ^ ^ ^ ^ \ 1/2 

A/""' = S^r'"/^ with A = (T'^(Ao) I T'"(Ao) ) , which helps to highlight the dependence 



of the equations on the SNR, yields the following approximation of the likelihood: 

(38) 



p{r\e) oc exp ( - i [ ( r° I r° ) - ( 1 Ar-(Ao) ) 



+ A' {{M- I A/? ) - ^ {f"" I AT- . 6-6^ + 0(6^)] ) 



The expansion can be cut off at order because higher order corrections in the exponent 
would correspond to higher order corrections to the estimator and we are only interested 
in the first order terms. One sees that for high SNR the likelihood is approximated by a 
multivariate normal distribution. The first summand is just a constant and will be absorbed 
into the normalization constant A/", the second term shifts away the maximum of the distri- 
bution from the true parameters due to instrumental noise, whereas the third term contains, 
in round brackets, the inverse of the covariance matrix of the errors and mainly determines 
the width of the distribution. The correlation of the noise with the second derivatives of 
the signal gives a first correction to the Fisher matrix as the inverse of the covariance but 
as will be clear from Eq. (139| ) one can neglect this correction for high SNR since it scales 
with 1/A compared to the Fisher matrix. 

With all that at hand one can compute the Bayesian estimator by solving the integral 
over the likelihood, which is a lengthy but straightforward calculation. Here we present just 
the result, 

-1 



(39) 



pa/3 I fn I 
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This equation reveals that, as promised, the deviations from the true parameters are com- 
pletely determined by the template-manifold metric and the length of the projection of the 
noise vector onto the tangent space of Ai at the point of the best fit. Also, the parameter 
errors decrease with 1/SNR. We stress again that this result is obtained as a first-order 
SNR approximation which is supposed to be sufficient for BBO as the expected SNRs are 
high enough to justify this approach. Anyway, higher order expansions and the infiuence of 



prior information on the estimator can be looked up in 



371 ] and 



C. The Projection Operator 

In the last two sections, we outlined a strong connection between differential geometry 
and methods used in data analysis of gravitational waves from compact binary objects 
such as neutron-star neutron-star binaries. Investigations of matched filtering of data led 
to a definition of a scalar product on the vector space V of detector outputs. Modelling 
the waveforms by post-Newtonian templates smoothly depending on a set of parameters 
describing the physical properties of the binaries, the detector as well as their relative motion, 
confines the possible outputs generated by a gravitational wave to a submanifold Ai within 
V. The errors occurring at the estimation of the signal parameters are then completely given 
in terms of geometrical quantities such as projections onto and within the tangent space of 
Ai at the best-fit parameters. 

In this section, we further exploit the geometrical restrictions on the waveforms and 
present a successful implementation of a projection method to cancel the subtraction noise 
occurring by subtracting the best fit from the data stream. Eq. ( 1391) showed that the 
expected parameter errors are proportional to 1/SNR which can be used to see how the 
amplitude of the subtraction noise depends on the SNR. The following equation provides a 
Taylor expansion of the template waveform around the true signal T™(Ao) evaluated at the 
expected error: 

f-((e)) = f-(Ao) + tr(Ao)(6") + f-(Ao)(6")(6^) + ^(SNR-^) (40) 

Making use of the fact that T™(Ao) and all derivatives of T™ depend linearly on the SNR, 
one finds that the first term in this expansion is proportional to SNR, the second which is 
the leading term of the subtraction noise is independent of the SNR which can be seen, too. 
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by computing the mean norm of the subtraction noise to leading order in the SNR 



Again, A'^p is the number of parameters describing the total signal, or in other words the 
dimension of the template manifold. The mean spectral density of the subtraction noise in 
each channel is most suitably expressed in terms of the ratio S{5T^] f)/S^{f) of spectral 
densities of the subtraction and the instrumental noise. In the simplest possible case, we 
could consider a signal with constant ratio S{6Tj^; f)/S^{f) for each frequency within 
a given signal bandwidth and negligible ratios outside the bandwidth. The integral which 
yields the scalar product in Eq. ( HTl) is converted into a sum of these ratios over all frequency 
bins at / = 1/T, . . . , fs/2 within the signal bandwidth. For BBO which furnishes data from 
8 channels sensitive to GWs and which permits a total observation time of T = 10*^ s, and 
assuming a total signal of 10^ CBCs {Np ~ 10^) is contained within a bandwidth of 1 Hz, 
this would lead to S{6T^; f)IS"{f) = 0.5iVp/(8 ■ 1 Hz ■ T) ~ IQ-^. A cosmic gravitational 
wave background with energy density Q > 10^^^ would have a spectral density which is more 
than one order of magnitude less than the weakest possible subtraction noise level. This 
level is based on an optimal search of the CBCs which cannot be performed even if a steady 
development of computational facilities over two or three decades in accord with Moore's 
law is assumed. So the subtraction noise, if remaining within the data, would prohibit the 
detection of an inflation generated background with high certainty. 

Eq. ( l40i) also shows that the third term is proportional to 1/SNR. So in deleting the zeroth 
and first order terms from the data stream one reduces the signal strength by a factor of 
1/SNR^. The first order term is a linear combination of first derivatives of the signal, or in 
other words a vector lying in the tangent space of M. at the true parameter values. Since 
the expected parameter errors scale with 1/SNR, it follows that the two tangent spaces at 
the best-fit and the true signal can be regarded as nearly identical. So from now on, all 
derivatives are taken at the best-fit parameters which are obtained by searching the data 
for signals and estimating model parameters. The leading term of the subtraction noise is 
taken as a vector in the tangent space at the best fit. 

Fig. (fTTj) schematically shows the template manifold A4i of signals contained in the data 
stream of channel i, the true signal T^^, the best-fit T^^{\) and the parallel and perpendicular 




(41) 



Np 
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FIG. 11: (Color online) The figure represents the template manifold Alj of a specific TDI channel 
i. Since the best-fit parameters A are estimated with information from all channels, the projection 
of data in channel i does not yield the best-fit waveform 7^™(A). It is assumed that the true 
signal lies on the template manifold at the marked point. The noise vector which points from 
the true signal to the measured data, is split into its components parallel and perpendicular to the 
manifold at the true signal. The (tiny) vector of the CGWB is not shown. 

parts of the instrumental noise. In a specific TDI channel, the projection of instrumental 
noise generally does not coincide with the difference between the best-fit template and the 
true signal, as it is the case in the complete vector space V. That is due to the fact that 
by using information from all channels to find the best-fit, your estimate will be somewhat 
better than if you had determined the parameters just with one data stream. Important to 
say is that because the best fit subtracted from the data in each channel is determined by 
the same best-fit parameters, the subtraction noise will be correlated in all channels and not 
be deleted by a cross-correlation measurement between channels, see section I VI II But good 
news is that most of the correlated error is restricted to the tangent space of the manifolds 
M.^ at the best fit, and projection out tangential directions of the residual data will delete 
the correlations. The tangential part of the CGWB which gets projected out is negUgible 

Q. 

To perform this task one can define a projection operating on a specific channel which 
removes the tangential parts: 

?, = \-vf\dj:nkdflr\ (42) 
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where F""^ denotes the inverse of the Fisher matrix F^^ = (^daT™{\)\dpT^(\)'^ . The 
projected data stream can be calculated, e.g. in the frequency domain, according to 

Here, T^'^ denotes the residual data which remains after subtraction of the best fits. In case 
the initial data stream contains a GW foreground generated by CBCs, a CGWB 7^^ and 
instrumental noise 7^°, then after subtracting the best fit and projecting the data, what is 
left is 

P(7:-^) = T-^ + + 0(7:VSNR2) (44) 

the perpendicular parts of the instrumental noise and the cosmic GW background. The 
loss of power in 7^^ is negligible as already mentioned and the remnants of the instrumental 
noise will be removed by cross-correlating the data of the two collocated detectors in the 
star-of-David configuration. 

We want to point out another interesting property of the subtraction-noise projection. 
For arbitrary template manifolds, the best fit has to lie close to the true signal. If that is not 
the case, then tangential planes at the best fit and the true signal do not coincide. This could 
entail a poor performance of the projection operator Eq. (H2l) in removing residual power of 
the signal spectrum. Now imagine a fiat template manifold. In that case, the best fit does 
not have to be accurate, since tangential planes coincide at all points on the manifold. In 
fact, since the vanishing signal is always an element of the model (e.g. coming from a source 
which is far away), the projection method works without having to estimate the signals at all! 
The only information which is needed is the number of signals. This information determines 
the right dimension of the manifold. Although a completely fiat metric does not represent 
most analysis problems, we can already see in (signal subtraction noise projected 
signal) spectra shown in section IVIII Al that the power of the projected spectrum is very 
low at all frequencies independent of the power of the subtraction noise. Obviously in our 
simulation, some parameters of some signals were poorly estimated, but their subtraction 
noise is removed with high accuracy. This can just be explained with a sort of "partial" 
flatness of the template manifold. Certainly, this feature needs to be investigated in the 
future. 
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VII. CROSS CORRELATION OF TDI CHANNELS 



In section IVB I we have seen, that the CGWB is completely covered by the detector noise. 
One needs to reduce the difference between noise and background spectral densities by a 
factor of 10^ in the case of a CGWB with Q = 10^^^. In this section we will describe, how 
that reduction can be done by cross correlating the TDI streams of the collocated detectors. 
Increasing the SNR of the background is achieved by increasing the observation time until 
the correlation output is dominated by contributions from the CGWB. One can study the 
correlation measurement for small observation times T and then extrapolate the output for 
higher T by making use of a simple scaling law of the SNR with observation time. We will 
see that performing a correlation measurement with data gathered over 3 years, which is 
BBO's proposed mission lifetime, it is possible to detect a CGWB with energy densities 
below Q = 10~^^. These results are based on the assumption that the subtraction noise 
from the CBC foreground can be removed with sufficient accuracy. Our results which are 
presented in section IVIIII show that this is indeed the case (at least for 100 NS/NS). 

We consider the TDI channel output 'T^{f) as a sum of the CGWB and of the detector 
noise, 

^^(/) = ^'(/) + W), (45) 

where the noise is assumed to be Gaussian, stationary and uncorrelated between different 
channels, and the foreground signal is subtracted below the CGWB. Unlike the instrumental 
noise, the CGWB will be correlated in different channels to some degree which can be 
predicted by the ORF. In the frequency domain, the expected outcome of a correlation Cij 
between channels i, j is an integral over all frequencies of the data-stream product 



^ J2 5^l5W7m g^^.(^) ^^^^ 

where a channel-dependent filter function Qij is used to suppress contributions from fre- 
quencies with strong instrumental noise or weak (expected) CGWB. According to Eq. (ITQll . 
the expectation value {Cij) of the correlation measurement for sufficiently long observation 
times is 

/ 
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where 5*^ is the GW strain spectral density. The variance of correlation noise which is 
dominated by contributions from the instrumental noise is given by 

((AC,,)^) = 5^5r(/)^°(/)Q?,.(/) (48) 
/ 

Now, we can understand how in general the correlation signal-to-noise ratio SNRjj = 
{Cij) / {{ACij)'^) scales with observation time T. The number of frequencies (frequency 
bins) summed over in Eq. ( 147|) increases proportional to T whereas the standard deviation, 
which is the square root of Eq. (HSj) . scales with -\/T- Therefore, increasing the observation 
time, one eventually raises contributions from the CGWB above the expected deviations. 
If the range of frequencies A/ contained in the sums is small enough, then the functions 
within the summands can be taken as constants and the two equations, evaluated at a 
fiducial frequency /o which lies within the bandwidth, become 

(Q,)^(T.A/)7.,(/o)^^/o)Q.,(/o) 

So, in this small-bandwidth approximation, the SNR is independent of the (constant) filter 
function Qij 

SNR,,- = y/TrAf^^0i£M^ (50) 



'Sfifo)Sfifo) 

If the small-bandwidth, flat spectrum approximation does not hold, then there exists an 
optimal filter which is based on models for the noise spectral densities in the two channels, 
the ORE and the spectral density of the stochastic background. Its purpose is to suppress 
contributions from frequencies which would contribute strongly to the instrumental noise in 
Cij, but weakly to the GW correlations. Accordingly, the optimal filter is given by [l^ 

The filter function between channels Aq and Ai based on a fiat Q = 10~^^ model for the 
CGWB is shown in Eig. [121 In addition, one should keep in mind that the WD/WD barrier 
enforces a lower boundary on correlation frequencies. The filter maximum lies at 0.2 Hz 



which is half-way inside the WD/ WD spectrum. In fact, most models presented in 17 1 
predict a cosmological distribution of WD/WD which gives rise to an energy density of 
Q ~ 10^^^ - 10^^'^ at 0.2 Hz, which would make a detection of a cosmological background 
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FIG. 12: The graph displays the filter functions QAoAiif) and QE^Eiif)- It contains models for 
the instrumental noise spectral densities (given in Eq. (jl2p ) and is based on a flat- $7 model of the 



impossible at these frequencies unless the WD/WD signals were resolvable. Assuming that 
the WD/WD foreground at frequencies 0.2 Hz cannot be analyzed, one has to find out if the 
maximum of the filter does determine the most efficient correlation frequencies. This is not 
the case, but as we will see, efficient frequencies are not much greater than suggested by the 
filter. Optimally, one had to design the instrument such that the most efficient frequencies 
lie above the WD/WD barrier. To find a definite answer to this problem, one has to calculate 
the contribution from certain frequencies to the SNR expected from a specific model of the 
CGWB. The SNR for the optimal filter assumes the form 



For a network of detectors, one would sum over all independent correlation pairs {ij) to 
obtain the total network SNR 



CGWB. 




(52) 



SNR, 



'tot — 




(53) 
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FIG. 13: The curve shows SNR^q^.(/) as defined in Eq. (j53|) . It includes contributions from the 
two statistically independent pairs Aq Ai and Eq ^ Ei. This curve does not depend on the 



frequency bin fi = i/T within a chosen correlation bandwidth (a lower boundary for this band is 
set by the WD/WD barrier, the upper boundary is ultimately set by half of the sampling frequency 
/s). For example, neglecting the WD/WD barrier and approximating the area under the curve by 
a rectangle (0.3 Hz) x (0.003), a SNR^ = 25 would be obtained after T = 25/(0.3-0.003) s 3- lO'' s. 

Fig. [T3] shows SNR^Q^.(/) including the two correlation pairs Aq ^ Ai, Eq Ei of BBO. 
The maximum of this curve is shifted towards higher frequencies with respect to the filter 
maximum, because the additional brings in a factor at frequencies below 1 Hz and the 
additional background spectrum a factor /~^. So, in total, the filter spectrum is multiplied 
by a factor /. Most of the SNR is collected at frequencies near 0.23 Hz which may still be 
a bit too low. Certainly, this issue needs to be investigated in the future. 

VIII. RESULTS 

Our results are presented in two ways. In the first part of this section, we show subtraction 
noise and projected signal spectra and compare them in total power. In the second part, the 
outcome of correlation measurements between the two collocated detectors is summarized in 
tables, and we compare contributions from instrumental noise, CGWB and CBCs. Also, the 
decrease of CBC correlations by subtraction of best fits and noise projection is investigated. 
A sensitivity of BBO to stochastic backgrounds is derived and extrapolated to an observation 



observation time T. The total SNR^ 



can be calculated by adding values of SNR^qj(/) for each 
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time of 3 years. 
A. Projection 

In this section, we focus on two salient features of the projection results, which are pre- 
dicted by theory. First, the projected spectra are compared with subtraction-noise spectra 
to find that the residual signal power is sufficiently small to enable CGWB detection within 
a certain frequency range. Second, we show that the projection operates selectively on sub- 
traction noise and leaves stochastic signals like the instrumental noise and the CGWB more 
or less unaffected. 

In Fig. [131 the CBC spectrum together with the subtraction noise and its projection 
are displayed from 0.1 Hz to 1 Hz. With a few exceptions, the subtraction noise is weaker 
than the signal spectrum. The projection is applied to 17 out of 100 signals to remove the 
subtraction noise between 0.2 Hz — 0.5 Hz. The peaks in the subtration-noise spectrum 
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FIG. 14: The figure shows the recorded signal spectrum (dotted), the subtraction-noise spectrum 
(dashed) and the projected subtraction-noise spectrum (solid) between 0.1 Hz and 1 Hz. The sub- 
traction noise is comparatively high at low frequencies, since pN waveforms are determined by 
strongly correlated parameters. These correlations decay at higher frequencies where a consider- 
able part of the phase evolution is observed leading to better waveform estimates. Subtraction 
noise within the correlation band 0.2 Hz — 0.5 Hz is projected. As one can see, all peaks of the 
subtraction noise are removed. The residual noise is negligible compared with a CGWB spectrum 
with fractional energy density O = 10~^^. 
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are removed. The residual spectrum lies below a CGWB with Q = 10~^^ (see Fig. [TOl) . 
Remarkably, the projection works accurately although some of the CBCs in the correlation 
band are poorly estimated, which can be seen by comparing the subtraction noise with the 
signal spectrum: the weaker the subtraction noise, the better the best fit. Simulating longer 
observation times would significantly improve the gain of the noise projection. We conclude 
that the subtraction noise will pose no problem to future-generation detectors as long as all 
foreground signals can be detected and modelled accurately. 

Projecting the data means to remove all contributions which are correlated with certain 
template derivatives. These derivatives are associated with directions in a sampling space. 
A stochastic process has the property to distribute its energy randomly along all directions 
of the sampling space with a given mean value of the energy per direction, which is the 
geometrical interpretation of the fact that a stationary process has constant variance. If we 
project out A'^p directions of the sampling space, then, in average, we will remove a fraction 
Np/N of the power of any stochastic, stationary process, where is the dimension of the 
sampling space which is the number of samples. As long as the number of projected directions 
is much smaller than A^, the loss in power will be negligible. In our case, we project along 
17 X 9 (17 CBCs with 9 parameters each) out of 10^ x 5.24288 (observation time multiplied 
with sampling frequency) directions. The mean predicted power loss of stationary processes 
like the CGWB or the instrumental noise is 0.03% which is indeed insignificant. In Fig. [151 
we plot the spectrum of the total data before (upper figure) and after (lower figure) data 
analysis including the noise projection. There is no visible change in the instrumental noise 
spectrum, but all signal peaks between 0.2 Hz and 0.5 Hz are removed. More quantitative 
results can be found in the next section which presents correlation values of all components 
of the total data. 

B. Correlation 

The correlation measurement is evaluated in two ways. At first, we compute statistics 
for different observation times to make it possible to extrapolate our results to a 3 year 
observation time with certain confidence. These statistics are based on simulations without 
the CBC foreground. The reason is that the computation would take too long and also, 
the projection performance is insufficient at observation times much shorter than T = 10^ s. 
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FIG. 15: The upper graph displays the spectrum of the total data before it is analyzed, whereas 
the lower graph displays the spectrum after best-fit subtraction and noise projection. Some of the 
signal peaks in Fig. [T3] can be identified in these two spectra. There is no visible change of the 
instrumental noise level. Here, every 20th frequency bin is plotted to reduce the amount of data. 



because the Fisher matrices are highly ill-conditioned and parameter correlations are too 
strong to allow a meaningful estimation of signal parameters by means of Eq. ( |T3l) . Second, 
we investigate the correlation including the CBCs and noise projection for an observation 
time T = 10^ s. 

As mentioned in earlier sections, the correlation measurements are carried out between 
channels Aq ^ A\ and Eq ^ E\ and subsequently, the two values are added to form the 
total correlation SNR. Each correlation value corresponds to an integral over the correlation 
band 0.2 Hz — 0.5 Hz. The results are shown in Table HTl The first column contains values of 
four different observation times used to obtain the correlation statistics. The second column 
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T 




Ccgwb 


Ctot 


1 • lO^s 


0.00 ±2.60 


5.90 ±0.07 


5.90 ±2.54 


2 • lO^s 


0.83 ± 5.74 


14.72 ±0.12 


15.34 ±5.87 


4 • lO^s 


-0.93 ±6.54 


26.28 ±0.19 


25.56 ±6.42 


8 • lO^s 


1.73 ±9.87 


53.80 ± 0.29 


55.20 ± 10.12 



TABLE II: The table contains the mean values and standard deviations of correlation measure- 
ments between channels Aq <-> Ai and Eq ^ Ei for different observation times. Values in each 
row are based on 20 measurements. Here, the total data is the sum of the CGWB and the in- 
strumental noise. Whereas the mean value of the correlation Ctot of the total data is dominated 
by contributions from the CGWB, its standard deviation is determined by contributions from the 
instrumental noise. The mean value of Ctot is supposed to increase linearly with T, whereas its 
standard deviation - predicted to be the square root of the mean value - is supposed to increase 
with \/r. 

shows the respective correlation outcome of the instrumental noise, the third column of 
the CGWB with VL = 10~^^ and the last column of the total data, which, in this case, is 
the sum of the instrumental noise and the CGWB. Each correlation value is based on 20 
measurements. In summary, the evolution of the mean values and standard deviations of the 
correlation with observation time confirms theoretical predictions. The mean value of Ctot 
increases approximately linearly with observation time T and its standard deviation increases 
with the square root of T. Furthermore, the standard deviations of Ctot are always close to 
the square root of the mean value which would ideally hold, since the optimal filter defined 
in Eq. (15T]) is applied. The fact that the measured standard deviation is somewhat higher 
than the predicted value means, that our noise model which governs the correlation filter 
is weaker than the actually measured spectrum of the instrumental noise. This descripancy 
can be explained since the (equal-arm) noise model is not used to generate time series of the 
noise in our simulation. In reality, one would take greater care in choosing the right noise 
model such that variances are equal to mean values, and the SNR is faithfully calculated 
for a single measurement. Alternatively, one may use the total spectrum after projection as 
noise model. In our simulation, we obtain the SNR by averaging over many measurements 
and the quality of the noise model does not have to be very high. 
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Before extrapolating to higher observation times, we have to make sure that subtraction 
noise can be removed from the data. Therefore, we show correlation values for our longest 
run with T = 10^ s related to the CBC: 

Ccbc = 19000, aub = 3560, Cproj= 2.34 (54) 

The correlation value of the CBC without best-fit subtraction and projection is Ccbc- When 
the best fits are subtracted, this value is reduced to Csub- By consequence, a CGWB with 
Q = 10"^^ could not be detected without noise projection, since the CGWB correlation 
is around 55. The correlation of the projected noise is further reduced to Cproj which lies 
well below the CGWB value. The background becomes detectable. A remaining problem 
is that the true data will contain about 100 to 1000 times more signals which leads to a 
similar increase of the projected spectrum, but two effects are working for the good of the 
mission. Theoretically, the projected spectrum should decrease with 1/T which is due to 
an improvement of the accuracy of the estimated signal parameters. In addition, we believe 
that our projection results would be much better, if a different template class was used or 
if we had implemented the F statistics to maximize over nuisance parameters, since our 
waveforms are governed by templates with highly correlated parameters. These issues will 
have to be scrutinized in the future. 

Now, assuming that the foreground subtraction noise can be projected out, we extrapolate 
correlation values to an observation time T = 10^ s. Our measurments confirm that the SNR 
scales with the square root of the observation time, and so we extrapolate our results of 
Tab. ini accordingly for each observation time and average over the four different predictions 
to obtain BBO's SNR corresponding to a flat stochastic background (i.e. constant Q): 

SNR(3 years) ^ 200 • (55) 

One has to keep in mind that this prediction is based on a restricted correlation band 
0.2 Hz — 0.5 Hz. The SNR would double if the entire detection band especially towards lower 
frequencies was chosen as correlation band. The lower boundary is understood as cut-off 
frequency due to the WD/WD foreground which may not be analyzable below 0.2 Hz. The 
upper boundary has no significant effect since high-frequency contributions to the SNR of 
the cosmic background are negligible. If one demands a minimal SNR of 5, then BBO's 
sensitivity is 

n^in ^ 2.5 ■ 10^^^ (56) 
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Given a lower bound of 0.2 Hz for correlation measurements, there should exist an optimal 
arm length of BBO which is slightly smaller than 50000 km, and which leads to a slightly 
improved sensitivity. It may also turn out that our choice for the lower boundary was too 
pessimistic and that BBO is more sensitive even without modifications. 

IX. CONCLUSION 

The analysis presented in this paper serves three main purposes. The primary intention is 
to demonstrate the application of a subtraction-noise projection on a simulated data-anaylsis 
problem. The method is explained by invoking a geometrical interpretation of optimal sig- 
nal detection and parameter estimation in the presence of additive, Gaussian noise. Second, 
we showed how to construct an analysis pipeline for a time-delay interferometer network, 
which seeks for a stochastic gravitational-wave background in the presence of a foreground 
built of compact-binary coalescences (CBCs). The detector network is simulated dynami- 
cally, thereby automatically including detector motion and time-varying response functions. 
Therefore, the definition of the overlap-reduction function had to be generalized so that it 
directly quantifies the total instrumental infiuence on correlation strengths of detector out- 
puts, instead of quantifying the correlation of projected gravitational-wave induced strains 
in terms of somehow normalized quasi-stationary detector response functions. Third, since 
our simulation is based on a design proposal for the BBO detector network, we derive a 
prediction of its sensitivity towards stochastic backgrounds, which should be more robust 
than values obtained from previous investigations. 

The simulation creates time series of the instrumental noise and compact binaries for each 
photodiode in the network. During observation, response functions and detector positions 
change. We use equations of the orbital motion of each satellite which are specified up 
to second order in the orbital eccentricity. Consequently, each triangular configuration 
of satellites is simulated with cartwheel and breathing motion. The isotropic stochastic 
background with a given spectral density is directly generated in the frequency domain, 
by first computing transfer functions and overlap-reduction functions of and between each 
network output. These frequency-domain functions are obtained via FFT of the detectors' 
impulse responses. 

Based on the assumption that all foreground signals in the data from compact binaries are 
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detected, we found as expected, that by removing the estimated foreground from the data, 
the residual foreground spectrum due to inaccurate fitting of waveforms covers the spectrum 
of the cosmological background and makes it impossible for BBO to detect it, unless its 
fractional energy density assumes very high values. We implemented a subtraction-noise 
projection into the pipeline, which allows to accurately remove the subtraction noise. In 
simulation runs with T = 10^ s and operating on data with 100 CBCs, the total power 
of the projected signal is 4 orders of magnitude weaker than the original signal spectrum, 
which is much better than previously expected since the analysis is based on a comparatively 
short observation time which entails strong correlations between different parameters of the 
signal model. This outcome can just be explained by assuming that the template manifold 
is approximately flat, at least along certain directions. The smaller the curvature of the 
manifold in the vicinity of the true signal, the weaker are the requirements on the accuracy 
of waveform fitting. A detailed analysis of the template-manifold curvature will have to 
be carried out in the future to understand how precise a best fit has to be such that the 
respective subtraction noise can be projected out. 

If one wants to extrapolate this result to higher observation times and more CBCs, then 
one needs to separate the detection and estimation problem from the projection problem. 
Certainly, it will be much more difficult to detect all signals of a realistic foreground, and for 
a more realistic analysis scheme, like the hierarchical search, one has to study the influence 
of confusion noise on the quality of the waveform estimates. The final answer depends 
on how close one will come to an optimal CBC analysis. We do not intent to make any 
predictions here concerning this point. However, assuming that all CBCs are detected and 
accurately analyzed, we claim that the projected subtraction-noise spectrum of a realistic 
foreground observed over 3 years will be negligible. First, we argued in our paper that 
confusion noise is no issue in the subtraction-noise projection, since the projection is based 
on the total Fisher matrix of all projected signals including mutual correlations into the 
model. Only if excluded, correlations between signals manifest themselves as confusion 
noise which may deteriorate the projection. Second, Fisher matrices which characterize 3 
years of data will be much less ill-conditioned and for that reason much easier to handle 
numerically. That is the main reason why we consider our projection results very promising: 
we obtained good results choosing templates with strong parameter correlations. Third, and 
more fundamentally, the projected spectral power is supposed to decrease with 1/T which 
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further reduces the projected spectrum by a factor 1000. 

Now, applying simple, numerically confirmed scaling laws of the correlation SNR, we 
extrapolate correlation results obtained for a few runs with observation times below T = 10^ s 
to the full BBO mission lifetime of T = 10^ s. In conclusion, provided that the subtraction 
noise can be removed, the extrapolated sensitivity of BBO to a stochastic background is 
^min ~ 2.5 ■ 10^^''. However, one condition for this result is that the cosmological WD/WD 
foreground above 0.2 Hz can be resolved and fitted like the foreground of NSs and BHs. 
Conversely, it may turn out that the lower boundary of the correlation band is too pessimistic 
which would entail a better sensitivity of BBO (up to a factor of 2). Ultimately, the problem 
of the WD barrier can be ameliorated making the BBO arms shorter by a small factor to 
increase the frequency of optimal sensitivity towards stochastic backgrounds. 
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APPENDIX A: PARAMETER DERIVATIVES OF GW PHASE 

In this appendix, we elaborate on some details concerning the Fisher matrix of 
comparable-mass compact-binary inspirals within the pN formalism. The general frame- 
work is set in section HVl 

The GW phase evolution depends on the three parameters tc, M, fi, and therefore the 
respective Fisher-matrix components depend on the derivative of the GW phase with respect 
to these parameters. We found that the result can be cast into a form which resembles the 
pN expansion Eq. (I9l). In the following, these results will be presented. We start with the 
derivative of the phase with respect to the chirp time. It can be written as a sum 
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All symbols are defined as in section HVi A similar expression can be found for the derivatives 
with respect to the two mass parameters. The two expansions 
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